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Abstract. Most software verification tools can be classified into one of 
a number of established families, each of which has their own focus and 
strengths. For example, concrete counterexample generation in model 
checking, invariant inference in abstract interpretation and completeness 
via annotation for deductive verification. This creates a significant and 
fundamental usability problem as users may have to learn and use one 
technique to find potential problems but then need an entirely different 
one to show that they have been fixed. This paper presents a single, 
unified algorithm fclfcl, which strictly generalises abstract interpretation, 
bounded model checking and fc-induction. This not only combines the 
strengths of these techniques but allows them to interact and reinforce 
each other, giving a ‘single-tool’ approach to verification. 


1 Introduction 

The software verification literature contains a wide range of techniques which 

can be used to prove or disprove safety properties. These include: 

Bounded Model Checking Given sufficient time and resource, BMC will give 
counterexamples for all false properties, which are often of significant value 
for understanding the fault. However only a small proportion of true prop¬ 
erties can be proven by BMC. 

fc-Induction Generalising Hoare logic’s ideas of loop invariants, ^-induction can 
prove true properties, and, in some cases provide counterexamples to false 
ones. However it requires inductive invariants, which can be expensive (in 
terms of user time, expertise and maintenance). 

Abstract Interpretation The use of over-approximations makes it easy to 
compute invariants which allow many true propositions to be proven. How¬ 
ever false properties and true-but-not-provable properties may be indistin¬ 
guishable. Tools may have limited support for a more complete analysis. 

The range and variety of tools and techniques available is a sign of a healthy 

and vibrant research community but presents challenges for non-expert users. 

* This research was supported by the ARTEMIS Joint Undertaking under grant 
agreement number 295311 (VeTeSS), the Toyota Motor Corporation and ERC 
project 280053 (CPROVER). 



The choice of which tools to use and where to expend effort depends on whether 
the properties are true or not - which is exactly what they want to find out. 

To build a robust and usable software verification system it is necessary 
to combine a variety of techniques. One option would be to run a series of 
independent tools, in parallel (as a portfolio, for example) or in some sequential 
order. However this limits the information that can be exchanged between the 
algorithms - what is needed is a genuine compound rather than a simple mixture. 
Another option would be to use monolithic algorithms such as CEGAR [5], 
IMPACT [31] or IC3/PDR 13118) which combine some of the ideas of simpler 
systems. These are difficult to implement well as their components interact in 
complex and subtle ways. Also they require advanced solver features such as 
interpolant generation that are not widely available for all theories (bit-vectors, 
arrays, floating-point, etc.). In this paper, we argue for a compound with simple 
components and well-understood interaction. 

This paper draws together a range of well-known techniques and combines 
them in a novel way so that they strengthen and reinforce each other, fc-induction 
[25] uses syntactically restricted or simple invariants (such as those generated by 
abstract interpretation) to prove safety. Bounded model checking [2] allows us 
to test fc-induction failures to see if they are real counter-examples or, if not, to 
build up a set of assumptions about system behaviour. Template-based abstract 
interpretation is used for invariant inference [25124116) with unrolling produc¬ 
ing progressively stronger invariants. Using a solver and templates to generate 
invariants allows the assumptions to be used without the need for backwards 
propagators and ‘closes the loop’ allowing the techniques to strengthen each 
other. Specifically, the paper makes the following contributions: 

1. A new, unified, simple and elegant algorithm, fclfcl, for integrated invariant 
inference and counterexample generation is presented in Section [2] Incremen¬ 
tal bounded model checking, fc-induction and classical over-approximating 
abstract interpretation are shown to be restrictions of klkl. 

2. The techniques required to efficiently implement fclfcl are given in Section [3] 
and an implementation, 2LS, is described in Section [T] 

3. A series of experiments are given in Section [5] We show that fclfcl verified 
more programs and is faster than a portfolio approach using incremental 
BMC, fc-induction and abstract interpretation, showing genuine synergy be¬ 
tween components. 

2 Algorithm Concepts 

This section reviews the key concepts behind fclfcl. Basic familiarity with tran¬ 
sition systems and first and second order logic will be assumed. As we intend to 
use fclfcl to verify software using bit-vectors, we will focus on finite state systems. 

2.1 Program Verification as Second Order Logic 

To ease formalisation we view programs as symbolic transition systems. The 
state of a program is described by a logical interpretation with logical variables 





corresponding to each program variable, including the program counter. Formu¬ 
lae can be used to describe sets of states - the states in the set are the models 
of the formulae. Given x , a vector of variables, Start{x) is the predicate describ¬ 
ing the start states. A transition relation , Trans(x,x') is formula describing a 
relation between pairs of such interpretations which describes the (potentially 
non-deterministic) progression relations between states. From these we can de¬ 
rive the set of reachable states as the least fixed-point of the transition relation 
starting from the states described by Startlx). Although this set is easily de¬ 
fined, computing a predicate that describes it (from Start and Trans) is often 
difficult and we will focus on the case when it is not practical. Instead inductive 
invariant are used; Inv is an inductive invariant if it has the following property: 

Vcco, xi . ( Inv(x 0 ) A Trans{x 0 , x{) =$■ Inv(x i)) (1) 

Each inductive invariant is a description of a fixed-point of the transition relation 
but is not necessarily guaranteed to be the least one, nor is it guaranteed to 
include Start(x) although many of the inductive invariants we use will do. For 
example, the predicate true is an inductive invariant for all systems as it describes 
the complete state space. From an inductive invariant we can find loop invariants 
and function and thread summaries by projecting on to a subset of variables x. 

Many verification tasks can be reduced to showing that the reachable states 
do not intersect with a set of error states, denoted by the predicate Err(x). Tech¬ 
niques for proving systems safe can be seen as computing an inductive invariant 
that is disjoint from the error set. Using existential second order quantification 
(denoted 3 2 ) we can formalise this as: 

3 2 Inv. Va;o, x\. ( Start(x 0 ) => Inv(x Q ))/\ 

(Inv(x o) A Trans(xo,Xi) =$■ Inv{x\))A (2) 

(Inv(x o) =>■ ~^Err(x 0 )) 

Alternatively, if the system is not safe, then there is a reachable error state. One 
way of showing this is to find a concrete, n-step counterexample^]: 

3xq, ... ,x n . Start( x 0 ) A Trans{x il x i+ i) A Err{x n ) (3) 

iG[0,n—1] 


2.2 Existing Techniques 

Viewing program verification as existential second-order logic allows a range of 
existing tools to be characterised in a common framework and thus compared 
and contrasted. This section reviews some of the more widely used approaches. 

1 If the state space is finite and the system is not safe there is necessarily a finite, 
concrete counterexample. For infinite state spaces there are additional issues such 
as errors only reachable via infinite counterexamples and which fixed-points can be 
described by a finite formulae. 



The following abbreviations, corresponding to fc steps of the transition system 
and the first k states being error free, will be used: 

m = A Trans (xi,Xi + i) P[k\ = A ~>Err(xi ) 

ie[o,fc-i] ie[o,fc-i] 

Bounded Model Checking (BMC) [2] focuses on refutation by picking a unwinding 
limit k and solving: 

3a; 0 ,... , Xk . Start(x 0 ) A T[k] A ->P[k + 1] (4) 

Models of this formula correspond to concrete counterexamples of some length 
n ^ k. The unwinding limit gives an under-approximation of the set of reach¬ 
able states and thus can fail to find counterexamples that take a large number 
of transition steps. In practice BMC works well as the formula is existentially 
quantified and thus is in a fragment handled well by SAT and SMT solvers. 
There are also various simplifications that can reduce the number of variables 
(see Section l3Tl) . 

Incremental BMC (IBMC) (e.g. [10]) uses repeated BMC (often optimised by 
using the solver incrementally) checks with increasing bounds to avoid the need 
for a fixed bound. If the bound starts at 0 (i.e. checking 3xo.Start(x 0 ) AErr(x 0 )) 
and is increased linearly (this is the common use-case), then it can be assumed 
that there are no errors at previous states, giving a simpler test: 

3a; 0 , ■ ■ ■, Xk . Start(x 0 ) A T[k] A P[k] A Err(xk ) (5) 

K-Induction [2Ej can be viewed as an extension of IBMC that can show system 
safety as well as produce counterexamples. It makes use of k-inductive invariants, 
which are predicates that have the following property: 

Va:o ... Xk ■ I[k ] A T[k\ =4> Klnv(xk) (6) 


where 

/[*]= a KInv(xi) 
ie[o,fc-i] 

fc-inductive invariants have the following useful properties: 

— Any inductive invariant is a 1-inductive invariant and vice versa. 

— Any fc-inductive invariant is a (fc + l)-inductive invariant. 

— A (finite) system is safe if and only if there is a fc-inductive invariant KInv 
which satisfies: 


\/x 0 • ■ • Xk . (Start(x 0 ) A T[fc] => /[fc]) A 
(/[fc] A T[fc] KInv{Xk)) A 

(KInv(x k ) => ~^Err(x k )) 


( 7 ) 


Showing that a fc-inductive invariant exists is sufficient to show that an induc¬ 
tive invariant exists but it does not imply that the k-inductive invariant is an 
inductive invariant. Often the corresponding inductive invariant is significantly 
more complex. Thus fc-induction can be seen as a trade-off between invariant 
generation and checking as it is a means to benefit as much as possible from 
simpler invariants by using a more complex property check. 

Finding a candidate fc-inductive invariant is hard so implementations often 
use -iErr(x). Similarly to IBMC, linearly increasing k can be used to simplify 
the expression by assuming there are no errors at previous states: 


3x 0 ,..., Xk . (Start(xa) A T[k ] A P[k] A Err(xk ))\/ 
{T[k] A P[k] A Err(xk)) 


( 8 ) 


A model of the first part of the disjunct is a concrete counterexample (fc- 
induction subsumes IBMC) and if the whole formula has no models, then -i Err{x ) 
is a fc-inductive invariant and the system is safe. 

Abstract Interpretation [Bj While BMC and IBMC compute under-approximations 
of the set of reachable states, the classical use of abstract interpretation is to com¬ 
pute inductive invariants that include Start{x) and thus are over-approximations 
of the set of reachable states. Elements of an abstract domain can be understood 
as sets or conjuncts of formulae [8], so abstract interpretation can be seen as: 

3 2 AInv e &4.^x,x\. ( Start(x ) =>■ AInv{x))/\ . 

(AInv(x) A Trans{x,x\) => AInv(x i)) ' ' 

where sA is the set of formulae described by the chosen abstract domain. As a 
second step then one checks: 


Mx . AInv{x) =>■ —iEtt(x) 


( 10 ) 


If this has no models then the system is safe, otherwise the safety cannot be 
determined without finding a more restrictive Alnv or increasing the set sA , i.e. 
choosing a more expressive abstract domain. 

2.3 Our Algorithm: fclfcl 

The phases of the klkl algorithm are presented as a flow chart in Figure Q] with 
black arrows denoting transitions. Initially, k = 1 and PA is a set of predicates 
that can be used as invariant with T £ PA (see Section [3] for details of how this 
is implemented). 

After an initial test to see if any start states are errort0, fclfcl computes a 
fc-inductive invariant that covers the initial state and includes the assumption 
that there are no errors in earlier states. The invariant is then checked to see 

1 If the transition system is derived from software and the errors are generated from 
assertions this will be impossible and the check can be skipped. 




Fig. 1: The fclfcl algorithm (colours in online version) 


whether it is sufficient to show safety. If there are possible reachable error states 
then a second check is needed to see if the error is reachable in k steps (a genuine 
counterexample) or whether it is a potential artefact of a too weak invariant. In 
the latter case, k is incremented so that a stronger (fc-)invariant can be found 
and the algorithm loops. 

Also displayed in Figure Q] are the steps of incremental BMC, fc-induction 
and classical over-approximating abstract interpretation, given, respectively by 
the red dotted, blue dashed and green dashed/dotted boxes and arrows, klkl 
can simulate fc-induction by having & = {T} and incremental BMC by over¬ 
approximating the first SAT check. Classical over-approximate abstract inter¬ 
pretation can be simulated by having & = sd and terminating with the result 
“unknown” if the first SAT check finds a model. These simulations give an in¬ 
tuition for the proof of the following results: 

Theorem 1. 

— When klkl terminates it gives either a k-inductive invariant sufficient to 
show safety or a length k counterexample. 


















void main() 
{ 


guard #0 == TRUE 
x#0 == Ou 


unsigned x = 0; 


while (x<10) 
{ 


guard#l == guard#0 
x#phil == (guard#ls0 ? x#lbl : x#0) 
guard#2 == (x#phil < 10) && guard#! 
x#2 == lu + x#phil 


} 


guard#3 == !(x#phil < 10) && guard#l 
assert (x = = 10); x#phil == lOu || ! guard#3 


} 


(a) The program 


(b) The annotated SSA 


Fig. 2: Conversion from program to SSA 


— If IBMC or k-induction terminate with a length k counterexample, then klkl 
will terminate with a length k counterexample. 

— If k-induction terminates with a k-inductive invariant sufficient to show 
safety, then klkl will terminate with a k-inductive invariant sufficient to 
show safety. 

— If an (over-approximating) abstract interpreter returns an inductive invari¬ 
ant Alnv that is sufficient to show safety and stf C ^, then klkl will termi¬ 
nate with k = 1 and an inductive invariant sufficient to show safety. 

Hence klkl strictly generalises its components by exploiting the following 
synergies between them: unrolling k times helps abstract interpretation to gen¬ 
erate stronger invariants, namely /c-invariants, which are further strengthened 
by the additional facts known from not having found a counterexample for k — 1 
iterations; stronger invariants help fc-induction to successfully prove properties 
more often; and constraining the state space by invariants ultimately accelerates 
the countermodel search in BMC. We will observe these synergies also experi¬ 
mentally in Section [5] 

3 Algorithm Details 

Section [2] introduced klkl but omitted a number of details which are important 
for implementing the algorithm efficiently. Key amongst these are the encoding 
from program to transition system and the generation of fc-inductive invariants. 

3.1 SSA Encoding 

The presentation of klkl used transition systems and it is possible to implement 
this directly. However the symbolic transition systems generated by software 
have structural properties that can be exploited. In most states the value of 
the program counter uniquely identifies its next value (i.e. most instructions 
do not branch) and most transitions update a single variable. Thus states in 




before the loop (x#0) 

I. 

loop head multiplexer 

I 

loop body (x#lbl) 

I 

end of loop body (x : 


alter loop 

(a) The SSA form of a loop. 


before the loop (x#0) 

I 

loop head 1 multiplexer (x#phil%0) 



loop body 1 

I 

end of loop body 1 (x#2°/,0) 

j (x#lbl) 

loop head 0 (x#phil°/,l = x#2%0) 

loop body 0 

I 

end of loop body 0 (x#2°/ 0 l) 


merge loop exits 

I 

after loop 


(b) The SSA loop unwinding 
Fig. 3: Illustrations of various SSA encodings 


the transition can be merged by substituting in the symbolic values of updated 
variables, so reducing the size of the formulae generated. 

Rather than building the transition system and then reducing it, it is equiv¬ 
alent and more efficient to convert the program to single static assignment 
form (SSA). For acyclic code, the SSA is a formula that exactly represents the 
strongest post condition of running the code and generation of this is a standard 
technique found in most software BMC and Symbolic Execution tools. We ex¬ 
tend this with an over-approximate conversion of loops so that the SSA allows 
us to reason about abstractions of a program with a solver. 

Figure [2] gives an example of the conversion. The SSA has been made acyclic 
by cutting loops at the end of the loop body: the variable^ x#2 at the end of the 
loop body (“poststate”) corresponds to x#lbl, which is fed back into the loop 
head (“prestate”). A non-deterministic choice (using the free Boolean variable 
guard#lsO) is introduced at the loop head in order to join the values coming 
from before the loop and from the end of the loop body. Figure [3a] illustrates 
how the SSA statements express control flow. 

It is easy to see that this representation “havocs” loops because x#lbl is a 
free variable - this is why its models are an over-approximation of actual program 
traces. Precision can be improved by constraining the feedback variable x#lbl 
by means of a loop invariant which we are going to infer. Any property that 
holds at loop entry (x#0) and at the end of the body (x#2) can then be assumed 
to hold on the feedback variable x#lbl. 

Loop unwinding is performed in the usual fashion; the conversion to SSA 
simply repeats the conversion of the body of the loop. Figure l3bl illustrates an 
example of this. The top-most loop head multiplexer is kept and its feedback 
variable is constrained with the bottom-most loop unwinding. The only subtlety 

3 Variable name suffixes are use to denote the multiple logical variables that correspond 
to a single program variable at different points in the execution. 




is that the value of variables from different loop exits must be merged. This 
can be achieved by use of the guard variables which track the reachability of 
various program points for a given set of values. The unwinding that we perform 
is incremental, in the sense that the construction of the formula is monotonic. 
Assumptions have to be used to deal with the end of loop merges as there always 
has to be a case for “value is merged from an unwinding that has not been added 
yet” and this has to be assumed false. 

A more significant example is given in Appendix [A] 


3.2 Invariant Inference via Templates 

A key phase of fclfcl is the generation of KInv , a k-inductive invariant. Perhaps 
the most obvious approach is to use an off-the-shelf abstract interpreter. This 
works but will fail to exploit the real power of fclfcl. Each iteration, fclfcl unrolls 
loops one more step (which can improve the invariant given by an abstract 
interpreter) and adds assumptions that previous unwindings do not give errors. 
Without backwards propagation it is difficult for an abstract interpreter to make 
significant use of these assumptions. For example, an abstract interpretation 
with intervals would need backwards propagation to make use of assume (x + 
y < 10). Thus we use a solver-based approach to computing KInv as it can 
elegantly exploit the assumptions that are added without needing to (directly) 
implement transformers. 

Directly using a solver we would need to handle (the existential fragment of) 
second-order logic. As these are not currently available, we reduce to a problem 
that can be solved by iterative application of a first-order solver. We restrict 
ourselves to finding invariants KInv of the form T(x,S) where T is a fixed 
expression, a so-called template , over program variables x and template param¬ 
eters 6 (see Section & This restriction is analogous to choosing an abstract 
domain in an abstract interpreter and has similar effect - 17 only contains a 
the formulae that can be described by the template. Fixing a template reduces 
the second-order search for an invariant to the first-order search for template 
parameters'. 

3d. Va;o ... £Cfc. (Start(x 0 ) A T[k] => T[fc](d)) A , . 

{T[k](6)AT[k]=>T(x h ,6)) (iij 

with T[fc](d) = Aie[o fc-i ]T{xi,8). Although the problem is now expressible 
in first-order logic, it contains quantifier alternation which poses a problem for 
current SMT solvers. However, we can solve this problem by iteratively checking 
the negated formula (to turn V into 3) for different choices of constants d for the 
parameters 6: as for the second conjunct in m-- 

3x 0 ...x k - ->( T[k\(d ) A T[k] => T{x k .d)) (12) 

The resulting formula can be expressed in quantifier-free logics and efficiently 
solved by SMT solvers. Using this as a building block, one can solve this 3V 
problem (see Section l3~4l) . 


3.3 Guarded Template Domains 

As discussed in the previous section, we use templates and repeated calls (with 
quantifier-free formulae) to a first-order solver to compute fc-inductive invariants. 

An abstract value d represents, i.e. concretises to, the set of all x that satisfy 
the formula T(x,d). We require an abstract value _L denoting the empty set 
T(x, _L) = false, and T for the whole domain of x: T(x, T) = true. 


Template polyhedra We use template polyhedra (251, a class of templates for 
numerical variables which have the form T = (Acc < 5) where A is a matrix 
with fixed coefficients. Subclasses of such templates include Intervals, which 


require constraints 



Xi < 


da 

da 


for each variable a,’;, Zones (differences), 


and Octagons [22j. The r th row of the template are the constraint generated by 
the r th row of matrix A. 

In our template expressions, variables x are bit-vectors representing signed 
or unsigned integers. These variables can be mixed in template constraints. Type 
promotion rules are applied such that the bit-width of the types of the expres¬ 
sions are extended in order to avoid arithmetic under- and overflows in the 
template expressions. T corresponds to the respective maximum values in the 
promoted type, whereas _L must be encoded as a special symbol. 


Guarded templates Since we use SSA form rather than control flow graphs, we 
cannot use numerical templates directly. Instead we use guarded templates. In a 
guarded template each row r is of the form G r =>■ % for the r th row % of the 
base template domain (e.g. template polyhedra). G r is the conjunction of the 
SSA guards gt associated with the definition of variables Xi occurring in T r - G r 
denotes the guard associated to variables x appearing at the loop head, and G' r 
the guard associated to the variables x' at the end of the respective loop body. 
Hence, template rows for different loops have different guards. 

A guarded template in terms of the variables at the loop head is hence of the 
form T{x 0 ,6) = f\ r G r (x 0 ) => T r (x 0 ,S). Replacing parameters 6 by the values 
d we get the invariants T(x, d) at the loop heads. 

For the example program in Section 13.11 we have the following guarded in¬ 
terval template: 


T(x#lbl, ( 61 , 62 )) = 


guard#lA guard#lsO =>• x#lbl < 
guard#lA guardttlsO =>■ —x#lbl < 


61 

62 


We denote T'(aq, 6 ) = /\, r G(,(ai 1 ) =>■ T r (x\, 6) the guarded template ex¬ 
pressed in terms of the variables at the end of the loop body. Here, we have to 
express the join of the initial value at the loop head (like x# 0 ) and the values 
that are fed back into the loop head (like x#2). For the example above, the 
corresponding guarded template is as follows: 


{ {pg -O guard#2) A (ig guard#l A -iguard#lsO)A 
((ig => x’ = x#0) A (pg A ~>ig x' = x#2))A 

(pg V ig =>x' < 61 ) A (pg V ig => -x' < 6 2 ) 


3.4 Accelerated Solving of the 3V Problem 

As discussed in Section 13.21 it is necessary to solve an 3V problem to find values 
for template parameters 6 to infer invariants. 

Model enumeration. The well-known method [24l4j for solving this problem in 
formula m using SMT solvers repeatedly checks satisfiability of the formula 
for an abstract value d (starting with d = _L): 

T[k)(d) A T[k] A ~,T\x k ,d) (13) 

If it is unsatisfiable, then we have found an invariant; otherwise we join the 
model returned by the solver with the previous abstract value d. 

However, this method corresponds to performing a classical Kleene iteration 
on the abstract lattice up to convergence. Convergence is guaranteed because 
our abstract domains are finite. Though, the height of the lattice is enormous 
and even for a one loop program incrementing an unconstrained 64-bit integer 
variable the naive algorithm will not terminate within human life time. Hence, 
we are not going to use this method. 

Optimisation. What we need is a convergence acceleration that makes the com¬ 
putational effort independent from the number of states and loop iterations. To 
this end, we use a technique that is inspired by an encoding used by max- strategy 
iteration methods [13112123] . These methods state the invariant inference prob¬ 
lem over template polyliedra as a disjunctive linear optimisation problem, which 
is solved iteratively by an upward iteration in the lattice of template polyhedra: 
using SMT solving, a conjunctive subsystem (“strategy”) whose solution extends 
the current invariant candidate is selected. This subsystem is then solved by an 
LP solver; the procedure terminates as soon as an inductive invariant is found. 

This method can only be used if the domain is convex and the parameter 
values are ordered and monotonic w.r.t. concretisation, which holds true, for 
example, for template polyhedra Ax < d where d is a parameter, but not 
for those where A is a parameter. If the operations in the transition relation 
satisfy certain properties such as monotonicity of condition predicates, then the 
obtained result is the least fixed point, i.e. the same result as the one returned 
by the naive model enumeration above, but much faster on average. 


Our algorithm. We adapt this method to our setting with bit-vector variables 
and guarded templates. Since we deal with finite domains (bit-vectors) we can 
use binary search as optimisation method instead of an LP solver. 

The algorithm proceeds as follows: We start by checking whether the current 
abstract value d (starting from d = _L) is inductive (Equ. (fl3l) j. If so, we have 
found an invariant; otherwise there are template rows R whose values are not 
inductive yet. We construct the system 


A 

i£[0,fc— 1] 


f\r(£R Gri&i) = ^ > ^ G^r) 

A ArEi? *r(Xi ) =/- (e r (Xi) ^r) 


A T[k] A f\ G' r (x k )A(S r < e r {x k )) (14) 

r£R 






where e r is the left-hand side of the inequality corresponding to the r th row of 
the template. Then we start the binary search for the optimal value of Y2, r ^R ^ r 
over this system. The initial bounds for Y2reR ^ r are as follows: 

— The lower bound £ is Y2reR^r w here d' r is the value of e r (xk) in the model 
of the inductivity check m above; 

— The upper bound u is Y2reR rnax - va ^ ue ( r ) where max-value returns the 
maximum value that e r (xk) may have (dependent on variable type). 

The binary search is performed by iteratively checking ea for satisfiability 
under the assumption Y2 r eR^ r — m where m = median(£,u). If satisfiable, set 
£ := m, otherwise set u := m and repeat until £ = u. The values of S r in the 
last satisfiable query are assigned to d r to obtain the new abstract value. The 
procedure is then repeated by testing whether d is inductive (fl3l) . Note that this 
algorithm uses a similar encoding for bound optimisation as strategy iteration, 
but potentially requires a higher number of iterations than strategy iteration. 
This choice has been made deliberately in order to keep the size of the generated 
SMT formulas small, at the cost of a potentially increased number of iterations. 
A worked example is given in Appendix [A] 

4 Implementation 

We implemented fclfcl in 2LS0 a verification tool built on the CPROVER frame¬ 
work, using MiniSAT-2.2.0 as a back-end solver (although other SAT and SMT 
solvers with incremental solving support can also be used). 2LS currently inlines 
all functions when running AT AT. The techniques described in Section [3] enable 
a single solver instance to be used where constraints and unwindings are added 
incrementally. This is essential because fclfcl makes thousands of solver calls for 
invariant inference and property checks. 

Our implementation is generic w.r.t. matrix A of the template polyhedral 
domain. In our experiments, we observed that very simple matrices A generating 
interval invariants are sufficient to compete with other state-of-the-art tools. 

The tool can handle unrestricted sequential C programs (with the exception 
of programs with irreducible control flow). However, currently, invariants are not 
inferred over array contents or dynamically allocated data structures. 

5 Experiments 

We performed a number of experiments to demonstrate the utility and applica¬ 
bility of klkl. All experiments were performed on an Intel Xeon X5667 at 3 GHz 
running Fedora 20 with 64-bit binaries. Each individual run was limited to 13 GB 

4 Version 0.2. The source code of the tool and instructions for its usage can be found on 
http://www.cprover.org/wiki/doku.php?id=21s_for_program_analysis In the 
experiments we ran it with the option —competition-mode. 



of memory and 900 seconds of CPU time, enforced by the operating system ker¬ 
nel. We took the loops meta-category (143 benchmarks) from the SV-COMP’15 
benchmark set@ 

5.1 fclfcl Verifies More Programs Than the Algorithms it Simulates 

Table [1] gives a comparison between 2LS running fclfcl (column 6) and the same 
system running as an incremental bounded model checker (IBMC) (column 2), 
incremental fc-induction (i.e. without invariant inference, column 3) and as an 
abstract interpreter (AI) (column 4). klkl is more complete than each of the 
restricted modes. This is not self-evident since it could be much less efficient 
and, thus, fail to solve the problems within the given time or memory limits, 
fc-induction can solve 60.8% of the benchmarks, 13 more than IBMC. 32% of the 
benchmarks can be solved by abstract interpretation (bugs are only exposed if 
they are reachable with 0 loop unwindings), fclfcl solves 62.9% of the benchmarks, 
proving 3 more properties than fc-induction. 
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Table 1: Comparison between fclfcl, the algorithms it subsumes, the portfolio, 
and CPAchecker. The rows false alarms and false proofs indicate soundness bugs 
of the tool implementations. 


5.2 fclfcl is at Least as Good as Their Naive Portfolio 

To show that fclfcl is more than a mixture of three techniques and that they 
strengthen each other, consider column 5 of Table [Q This gives the results of 
an ideal portfolio in which the three restricted techniques are run in parallel 
on and the portfolio terminates when the first returns a conclusive result. Thus 
the CPU time taken is three times the time taken by the fastest technique for 
each benchmark (in practice these could be run in parallel, giving a lower wall 
clock time). In our setup, fclfcl had a disadvantage as each component of virtual 
portfolio had the same memory restriction as fclfcl, thus effectively giving the 
portfolio three times as much memory. 

‘ http://sv-comp.sosy-lab.org/2015/benchmarks.php 
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Fig. 4: Runtime Comparison 


Still, klkl is slightly faster and more accurate than the portfolio as can be seen 
in Table [Q The scatter plot in Figure |4ji shows the results for each benchmark: 
one can observe that fclfcl is up to one order of magnitude slower on many unsafe 
benchmarks, which is obviously due to the additional work of invariant inference 
that klkl has to perform in contrast to IBMC. However, note that klkl is faster 
than the portfolio on some safe and even one unsafe benchmarks. This suggests 
that klkl is more than the sum of its parts. 

5.3 fclfcl is Comparable with State-of-the-Art Approaches 

We compared our implementation of klkl with CPAcheckei@, and ESBMC0, 
which uses ^-induction. The results are shown in the last three columns in Table 
[Hand in the scatter plot in FigureQJ). Additional results are given in AnnendixlBl 
In comparison to CPAchecker, the winner of SVCOMP’15, our prototype of klkl 
is overall a bit slower and proves fewer properties (due to more timeouts), but as 
Figure 0J3 shows, it significantly outperforms CPAchecker on most benchmarks. 
ESBMC exposes fewer bugs, but proves many more properties and is significantly 
faster. However, it has 6 times more soundness bugs than our implementation^ 
These results show that our prototype implementation of fclfcl can keep up with 
state-of-the-art verification tools. 

6 Related Work 

Our work elucidates the connection between three well-studied techniques. Hence 
we can only give a brief overview of the vast amount of relevant literature. 

6 SVCOMP’15 version, http://cpachecker.sosy-lab.org/ 

7 SVCOMP’15 version, http://www.esbmc.org/ 

8 The two false alarms in our current implementation are due to limited support for 
dynamic memory allocation. 





















Since it was observed [25] that k-induction for finite state systems (e.g. hard¬ 
ware circuits) can be done by using an (incremental) SAT solver [ID], it has be¬ 
come more and more popular also in the software community as a tool for safety 
proofs. Using SMT solvers, it has been applied to Lustre models [77] (monolithic 
transition relations) and C programs 17] (multiple and nested loops). 

The idea of synthesising abstractions with the help of solvers can be traced 
back to predicate abstraction HD; Reps et al. ET] proposed a method for sym¬ 
bolically computing best abstract transformers; these techniques were later re¬ 
fined [41191291 for application to various template domains. Using binary search 
for optimisation in this context was proposed by Gulwani et al. m- Similar tech¬ 
niques using LP solving for optimisation originate from strategy iteration [13] . 
Recently, SMT modulo optimisation [27T2D] techniques were proposed that foster 
application to invariant generation by optimisation. 

/c-induction often requires additional invariants to succeed, which can be ob¬ 
tained by abstract interpretation. For example, Garoche et al. m use SMT 
solving to infer intermediate invariants over templates for the use in /c-induction 
of Lustre models. As most of these approaches (except dj), they consider (linear) 
arithmetic over rational numbers only, whereas our target are C programs with 
bit-vectors (representing machine integers, floating-point numbers, etc). More¬ 
over, they do not exploit the full power of the approach because they compute 
only 1-invariants instead of /c-invariants. Another distinguishing feature of our 
algorithm is that it operates on a single logical representation and hence enables 
maximum information reuse by incremental SAT solving using a single solver. 

Formalising program analysis problems such as invariant inference in second 
order logic and suggesting to solve these formulae with generic solvers has been 
considered by [15] . In this paper we provide an implementation that solves the 
second order formula describing the invariant inference problem by reduction 
to quantifier elimination of a first order formula. Our approach can also solve 
other problems stated in US], e.g., termination, by considering different abstract 
domains, e.g., for ranking functions. 

7 Conclusions 

This paper presents klkl and shows that it can simulate incremental BMC, fc- 
induction and classical, over-approximating abstract interpretation. Experiments 
performed with an implementation, 2LS, show that it is not only “more” com¬ 
plete than each individual technique - but it also suggests that it is stronger than 
their naive combination. In other words, the components of the algorithm syner- 
gistically interact and enhance each other. Moreover, our combination enables a 
clean, homogeneous, tightly integrated implementation rather than a loose, het¬ 
erogeneous combination of isolated building blocks or a pipeline of techniques 
where each only strengthens the next. 

There are many possible future directions for this work. Enhancing 2LS to 
support additional kinds of templates, possibly including disjunctive template 
and improving the optimisation techniques used for quantifier elimination is one 





area of interest. In another direction, fclfcl could be enhance to support function 
modular, intraprocedural, thread modular and possibly multi-threaded analysis. 
Automatic refinement of the template domains is another tantalising possibility. 
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A A Worked Example 

We explain the klkl algorithm step by step on the following example: 

void main() { 
int w=0 ,x, y, z ; 

__CPROVER_assume (x==y && y==z && —10<=x && x<0); 
while(l) { 
z = -y; 

y = -x; 

w+ + ; 

x = x + w; 

if(Wy 0 2 != 1) w /= 3; 
if(x>=10) x = y = z = 0; 
a s s e r t (x<=z + 3); 

} } 

SSA construction. We first perform a lighweight static analysis in order to trans¬ 
late to program into our SSA form: 

w#0 == 0 

guard#1 == (x#0 < 0 && x#0 == y#0 && y#0 == z#0 && x#0 >= -10) 

//loop head 

w#phil == (guard#ls5 ? w#lb5 : w#0) 
x#phil == (guard#ls5 ? x#lb5 : x#0) 
y#phil == (guard#ls5 ? y#lb5 : y#0) 
z#phil == (guard#ls5 ? z#lb5 : z#0) 

guard#2 == TRUE && guard#l //in the loop 

z#2 == -y#phil 

y#2 == -x#phil 

w#2 == 1 + w#phil 

x#2 == w#2 + x#phil 

guard#3 == (! (w#2 "/, 2 == 1) && guard#2) 
w#3 == w#2 / 3 

w#phi4 == (guard#3 ? w#3 : w#2) 

guard#4 == ((x#2 >= 10) && guard#2) 
z#4 == 0 
y#4 == z#4 
x#4 == y#4 

x#phi5 == (guard#4 ? x#4 : x#2) 
y#phi5 == (guard#4 ? y#4 : y#2) 
z#phi5 == (guard#4 ? z#4 : z#2) 

guard#6 == ITRUE && guard#l //after loop 


guard#2 ==> 3 + z#phi5 >= x#phi5 //assertion 





It is important to note here that the loop is cut at the end of the loop 
body in order to make the SSA acyclic. For this reason, we replace variables 
w#phi4, x#phi5, y#phi5, and z#phi5 by free variables w#lb5, x#lb5, y#lb5, 
and z#lb5 at the loop head. Since these variables are free we obtain the effect of 
“havocking” these loop variables. The invariants that we compute will constrain 
these variables. 

Invariant inference over the interval domain uses the following guarded template 
on our example; 

guard#2 kk guard#ls5 ==> 

w#lb5 <= delta#ll kk -((signed __CPRQVER_bitvector[33])w#lb5) <= delta#12 kk 

x#lb5 <= delta#21 kk -((signed CPROVER_bitvector[33])x#lb5) <= delta#22 kk 

y#lb5 <= delta#31 kk -((signed __CPROVER_bitvector[33])y#lb5) <= delta#32 kk 

z#lb5 <= delta#41 kk -((signed __CPROVER_bitvector[33])z#lb5) <= delta#42 

The casts such as (signed __CPROVER_bitvector [33] )w#lb5) are necessary 
to extend the bitwidth in order to prevent from arithmetic overflows in template 
expressions (which would be unsound). For intervals, the bitwidth extension 
could be avoided, but our algorithms are generic for template polyhedra. 

Invariant inference using intervals on above program is not very precise. We 
obtain the following result, which does not allow us to prove the property. 

guard#2 kk guard#ls5 ==> 

w#lb5 <= 2147483647 kk -((signed __CPROVER_bitvector[33])w#lb5) <= 715827882 kk 

x#lb5 <= 9 kk -((signed __CPROVER_bitvector[33])x#lb5) <= 2147483648 kk 

y#lb5 <= 2147483647 kk -((signed __CPROVER_bitvector[33])y#lb5) <= 2147483648 kk 

z#lb5 <= 2147483647 kk -((signed __CPROVER_bitvector[33])z#lb5) <= 2147483648 

Loop unwinding We perform incremental loop unwinding on SSA formula level. 

Since formula construction for incremental loop unwinding is non-monotonic 
[26], we have to introduce Boolean variables such as enable#0 which allow us to 
switch on/off certain parts of the formula as needed and use incremental SAT 
solving under assumptions [9] to solve these formulae efficiently. 

w#0 == 0 

//loop head of Oth unwinding 

enable#0 ==> (guard#l"/.0 == (x#0 < 0 kk x#0 == y#0 kk y#0 == z#0 kk x#0 >= -10)) 

enable#0 ==> (w#phil"/,0 == (guard#ls5"/„0 ? w#lb5"/.0 : w#0)) 

enable#0 ==> (x#phil"/,0 == (guard#ls5"/„0 ? x#lb5"/.0 : x#0)) 

enable#0 ==> (y#phil"/ 0 O == (guard#ls5"/ t 0 ? y#lb5"/ 0 O : y#0)) 

enable#0 ==> (z#phil"/,0 == (guard#ls5"/„0 ? z#lb5"/,0 : z#0)) 

//last unwinding 

guard#2"/,0 == (TRUE kk guard#l"/„0) 

z#27,0 == -y#phil"/„0 

y#27,0 == -x#phil"/„0 

w#27.0 == 1 + w#phil"/„0 



x#2°/ 0 O == w#2'/.0 + x#phil"/,0 


guard#3°/ 0 O == (! (w#2"/ t 0 7. 2 == 1) && guard#2"/ t 0) 
w#3'/„0 == w#2"/„0 / 3 

w#phi4"/,0 == (guard#3"/ 0 O ? w#3"/„0 : w#2"/„0) 

guard#47.0 == ((x#27.0 >= 10) && guard#2'/.0) 
z#4'/„0 == 0 
y#4°/„0 == z#4"/,0 
x#4'/„0 == y#4"/,0 

x#phi5"/,0 == (guard#47.0 ? x#4"/,0 : x#2"/„0) 

y#phi5"/,0 == (guard#47.0 ? y#4"/,0 : y#2"/„0) 

z#phi5"/,0 == (guard#4°/ 0 O ? z#4"/,0 : z#2"/ t 0) 

//merge variables from various loop exits 

enable#l ==> (guard#l == guard#l"/ t 0) 
enable#l ==> (w#phil == w#phil’/,0) 

enable#l ==> (x#phil == x#phil'/,0) 

enable#l ==> (y#phil == y#phil'/,0) 

enable#l ==> (z#phil == z#phil’/,0) 

guard#34 == (ITRUE && guard#l) //after loop exit 

guard#2'/ 0 O ==> 3 + z#phi5"/.0 >= x#phi5"/ 0 O //assertion 

For a further iteration we add the following. Note that we unwind backwards 
by inserting new unwindings before the old ones. Also note that formula un¬ 
winding generates an exponential blow-up of the SSA formula in the depth of 
loop nesting. 

//loop head of 1st unwinding 

enable#l ==> (guard#l"/,l == (x#0 < 0 && x#0 == y#0 && y#0 == z#0 && x#0 >= -10)) 

enable# 1 ==> (w#phil7„l == (guard#ls57.1 ? w#lb57.1 : w#0)) 

enable# 1 ==> (x#phil7.1 == (guard#ls57.1 ? x#lb57.1 : x#0)) 

enable# 1 ==> (y#phil7.1 == (guard#ls57.1 ? y#lb57.1 : y#0)) 

enable# 1 ==> (z#phil7.1 == (guard#ls57.1 ? z#lb57.1 : z#0)) 

guard#27.1 == (TRUE && guard#l7.1) 

z#27.1 == -y#phil7.1 

y#27.1 == -x#phil7.1 

w#27.1 == 1 + w#phil7.1 

x#27.1 == w#27.1 + x#phil7.1 

guard#37.1 == C! (w#2°/„l 7. 2 == 1) && guard#27.1) 
w#37.1 == w#27.1 / 3 

w#phi47.1 == (guard#37.1 ? w#37.1 : w#27.1) 

guard#47.1 == ((x#27.1 >= 10) && guard#27.1) 
z#47.1 == 0 
y#47.1 == z#47.1 





x#4’/,l == y#4"/.l 

x#phi5"/,l == (guard#4°/„l ? x#4"/„l : x#2"/.l) 

y#phi5"/,l == (guard#4°/ 0 l ? y#4"/.l : y#2"/.l) 

z#phi5"/,l == (guard#4'/ 0 l ? z#4”/.l : z#2”/.l) 

//stitch together Oth and 1st unwinding 
enable#l ==> (guard#T/,0 == guard#2“/,l) 
enable#l ==> (w#phiT/ 0 O == w#phi4"/,l) 
enable#l ==> (x#phil’/ 0 O == x#phi5"/«l) 
enable#l ==> (y#phiT/ 0 O == y#phi5"/,l) 
enable#l ==> (z#phil"/ 0 O == z#phi5"/,l) 

//merge variables from various loop exits 

enable#l ==> (guard#l == (! guard#2"/,l ? guard#T/ 0 l : guard#l"/ t 0)) 
enable#l ==> (w#phil == (! guard#2"/,l ? w#phil"/,l : w#phil"/ t 0)) 

enable#l ==> (x#phil == (! guard#2"/,l ? x#phil’/,l : x#phil"/ t 0)) 

enable#l ==> (y#phil == (! guard#2"/,l ? y#phil’/,l : y#phil"/ t 0)) 

enable# 1 ==> (z#phil == (! guard#2"/.l ? z#phiiy,l : z#phil"/ t 0)) 

guard#2'/ 0 l ==> 3 + z#phi5"/,l >= x#phi5"/ 0 l //assertion 


Obviously, unwinding further does not help for this example. This is why 
IBMC will not prove the property. 

k-induction. For proving that the property is fc-inductive, we assume the prop¬ 
erty for each unwinding j < k by adding guard#ls5°/ 0 j kk (guard#2°/ 0 j ==> 
3 + z#phi5°/„y >= x#phi5“/ 0 j) to the formula. However, the property is not k- 
inductive on this example. 

klkl additionally uses fc-inductive invariants. We infer the following invariant, 
which, together with the assumptions from fc-induction above, allows us to prove 
the property for k = 2 on this example. 

guard#2'/,2 && guard#ls5'/,2 ==> 

w#lb5'/,2 <= 1 kk -((signed __CPROVER_bitvector [33] )w#lb5y,2) <= 0 kk 

x#lb5'/,2 <= 9 kk -((signed __CPROVER_bitvector [33] )x#lb5y,2) <= 10 kk 

y#lb5'/,2 <= 7 kk -((signed __CPROVER_bitvector [33] )y#lb5y,2) <= 10 kk 

z#lb5'/,2 <= 6 kk -((signed __CPROVER_bitvector [33] )z#lb5y,2) <= 10 





B Further Results 


In addition to Table m Table m gives results for an extension of CPAchecker 
supporting k-induction discussed in the research report [I] . They use a classical 
abstract interpreter to generate auxiliary invariants. Their run times are simi¬ 
lar to the CPAchecker SVCOMP-15 version, but much less complete regarding 
proofs than fclfcl. They only use 1-invariants instead of fc-invariants, yet they use 
increasingly more precise abstract domains. Moreover, the extension seems to 
be still under development as the number of false proofs and alarms suggests. 
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Table 2: Comparison between klkl, the algorithms it subsumes, the portfolio, 
CPAchecker (SVCOMP’15), ESBMC and CPAchecker (k-induction). The rows 
false alarms and false proofs indicate soundness bugs of the tool implementations. 










